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Abstract — Protection  times  provided  by  31  synthetic  repellents  against  Aedes  aegypti  mosquitoes  were  correlated  with  the  chemical 
structures  of  these  repellents  using  Codessa  Pro  software.  Two  statistically  significant  quantitative  models  with  R2  values  of  ca.  0.80 
are  presented  and  discussed. 
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Repellents  are  materials  that  disrupt  the  natural  behav¬ 
ior  of  blood-seeking  insects  and  other  organisms;  repel¬ 
lents  provide  personal  protection  and  represent  the  first 
line  of  defense  for  humans  and  animals  against  biting.  A 
well-known  standard  repellent  is  .V.  A'-d  ict  h  y  1-3  -me  th- 
ylbenzamide  or  A,A-diethyl-m-toluamide  (DEET,  com¬ 
pound  7,  Table  l).1  However,  it  has  become  urgent  to 
locate  repellents  which  are  more  effective  than  DEET. 

Few  attempts  have  previously  been  made  to  apply 
QSAR  modeling  to  repellent  activities.  One  reason  for 
this  is  that  most  of  the  extensive  testing  that  has  been 
carried  out1  has  yielded  only  semi-quantitative  data. 
An  exception  is  the  work  of  Suryanarayana  et  al.2  who 
measured  a  set  of  31  repellents  and  proposed  the  corre¬ 
lation  Eq.  1,  where  logP,  log  Vp,  and  ML  are  lipophilic- 
ity,  vapor  pressure,  and  molecular  length,  respectively, 
and  a-d  are  constants. 

PT  =  ct\ogP  +  blog  Vp  +  clogML  +  d.  (1) 

However,  Eq.  1  has  a  low  correlation  coefficient  R  at 
0.551  (corresponding  to  a  R2  of  0.304)  and  in  addition, 
one  of  the  descriptors  is  the  measured  vapor  pressure 
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which  has  to  be  obtained  before  Eq.  1  can  be  used  to 
predict  the  activity  of  unknown  compounds. 

Other  authors3  have  also  suggested  that  vapor  pressure 
and  boiling  point  were  related  to  repellent  activity; 
repellency  is  lost  at  vapor  concentrations  below  a  certain 
minimum.4’5  Factors  including  evaporation  from  human 
skin,  skin  absorption,  and  penetration  clearly  influence 
repellent  bioassays.6  Test-related  factors  (such  as  the 
mosquito  species  utilized  the  cage  size  and  the  mosquito 
density)  also  affect  repellent  bioassays.7 

Ma  et  al.4  discussed  the  Suryanarayana’s  data  set  and 
postulated  that  amide  group  made  an  important  contri¬ 
bution  for  potent  repellent  activity,  but  reported  no 
numerical  correlation.  The  same  group  explored  molec¬ 
ular  similarity  between  insect  juvenile  hormone  and 
DEET  analogues  but  they  did  not  explore  any  quantita¬ 
tive  correlation  with  structure.8 

The  present  QSAR  study  correlates  mosquito  repellent 
activity  (protection  time,  PT)  as  reported  by  Suryanara¬ 
yana  et  al.2  with  theoretical  molecular  descriptors;  we 
have  also  examined  repellency  using  vapor  pressure  as 
an  external  descriptor  in  view  of  the  importance  attrib¬ 
uted  to  it  by  earlier  workers. 

Methodology  for  a  general  QSAR  approach  has  previ¬ 
ously  been  incorporated  in  the  Codessa  Pro9  software 
package  which  enables  the  calculation  of  numerous 
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Table  1.  DEET  (compound  7)  and  DEPA  analogues 


ID 

Compound 

PT  (h) 

Ring 

R 

R1  =  R2 

CAS  nr. 

Exp  Vp  (Torr) 

Pred.  log  Vp 

1 

o-Chlorobenzamide 

5 

a 

2-C1 

ch3 

6526-67-6 

9.63E-04 

-2.94 

2 

Cyclohexamide 

3 

s 

H 

ch3 

17566-51-7 

0.0225 

-1.77 

3 

m-Toluamide 

3 

a 

3-CH3 

ch3 

6935-65-5 

2.18E-03 

-2.42 

4 

o-Ethoxylbenzamide 

2.83 

a 

2-OC2H5 

ch3 

90526-02-6 

1.65E-04 

-3.75 

5 

Benzamide 

1.67 

a 

H 

CH3 

611-74-5 

0.0157 

-2.02 

6 

p-Anisamide 

1 

a 

4-OCHj 

ch3 

7291-00-1 

4.38E-04 

-3.35 

7 

m-Toluamide 

5 

a 

3-CH3 

c2h5 

134-62-3 

1.35E-03 

-3.25 

8 

Benzamide 

4 

a 

H 

c2h5 

1696-17-9 

2.30E-03 

-2.79 

9 

Cyclohexamide 

4 

s 

H 

c2h5 

5461-52-9 

2.92E-03 

-2.34 

10 

o-Ethoxylbenzamide 

3.5 

a 

2-OC2H5 

c2h5 

3688-82-2 

2.58E-05 

-4.61 

11 

p-Toluamide 

2.83 

a 

4-CH3 

c2h5 

2728-05-4 

3.65E-04 

-3.25 

12 

p-Anisamide 

1 

a 

4-OCHj 

c2h5 

7465-86-3 

7.22E-05 

-4.17 

13 

Benzamide 

3 

a 

H 

i-C3H7 

14657-86-4 

3.14E-04 

-3.62 

14 

m-Toluamide 

2.67 

a 

3-CH3 

f-C3Hv 

5448-37-3 

5.10E-05 

-4.11 

15 

Cyclohexamide 

2 

s 

H 

;-c3h7 

67013-94-9 

3.70E-04 

-3.60 

16 

p-Anisamide 

1.17 

a 

4-OCHj 

1-C3H7 

349397-58-6 

1.07E-05 

-5.02 

17 

o-Ethoxybenzamide 

1.08 

a 

2-OC2H5 

i-C3H7 

5442-04-6 

3.70E-06 

-5.49 

18 

o-Chlorobenzamide 

1 

a 

2-C1 

;-C,h7 

349397-59-7 

2.39E-05 

-4.53 

19 

p-Toluamide 

0.5 

a 

4-CH3 

;-c3h7 

R1  R2 

5448-37-3 

5.10E-05 

-4.12 

-3.16 

20 

m-Toluamide 

0.67 

a 

3-CH3 

HC2H5 

26819-07-8 

1.85E-03 

-2.86 

21 

Benzamide 

0.58 

a 

H 

HC2H5 

614-17-5 

3.95E-04 

-3.07 

22 

Cyclohexamide 

0.5 

s 

H 

hc2h5 

138324-59-1 

8.66E-04 

-3.08 

23 

p-Toluamide 

0.08 

a 

4-CH3 

hc2h5 

26819-08-9 

7.67E-04 

-3.98 

24 

p-Anisamide 

0.08 

a 

4-OCHj 

hc2h5 

7403-41-0 

1.29E-04 

-4.33 

25 

o-Ethoxybenzamide 

0.08 

a 

2-OC2H5 

hc2h5 

N,  Rl,  R2 

99985-68-9 

5.68E-05 

-3.76 

-3.66 

26 

Benzamide 

3 

a 

H 

Piperidine 

776-75-0 

3.16E-04 

-4.19 

27 

Cyclohexamide 

2 

s 

H 

Piperidine 

7103-46-0 

1.56E-04 

-4.66 

28 

m-Toluamide 

1.42 

a 

3-CH3 

Piperidine 

13290-48-7 

4.41E-05 

-4.21 

29 

o-Chlorobenzamide 

1 

a 

2-C1 

Piperidine 

22342-21-8 

1.94E-05 

-5.12 

30 

p-Toluamide 

1 

a 

4-CH3 

Piperidine 

13707-23-8 

4.90E-05 

-2.94 

31 

p-Anisamide 

0.75 

a 

4-OCH3 

Piperidine 

57700-94-4 

8.81E-06 

-1.77 

quantitative  descriptors  from  the  molecular  structural 
formula.10,11  Codessa  Pro  has  previously  correlated  suc¬ 
cessfully  numerous  physical  properties12  including  chro¬ 
matographic  retention  times  and  response  features, 
melting  and  boiling  points,  solvent  scales,  and  refractive 
indexes.13  Recent  examples  include  correlations  for:  (i) 
binding  energies  for  1 : 1  complexation  systems  of  organic 
guests  and  p-cyclodextrin,14  (ii)  the  in  vitro  minimum 
inhibitory  concentration  (MIC)  of  3-aryloxazolidin-2- 
one  antibacterials15,  and  (iii)  partition  coefficients  of 
medicinal  drugs  between  human  breast  milk  and 
plasma.16 

We  correlated  the  31  protection  times  (PT)  determined 
by  Suryanarayana  et  al.2  by  testing  the  compounds  at 
a  dose  of  1  mg/cm2  onto  the  external  surface  of  a  human 
hand  followed  by  exposure  to  200  female  (5-7  days  old) 
Aedes  aegypti  mosquitoes.  The  PT  is  defined  as  the  peri¬ 
od  of  protection  in  minutes  until  two  consecutive  bites 
are  made  within  a  30  min  interval.  The  reported  protec¬ 
tion  times  represent  averages  of  multiple  determina¬ 


tions.  The  compound  dataset  represents  31  amide 
analogues  of  N, N-d icth y  1-m- toluamide  (DEET)  and 
.V , /V- d  ie  t  hy  lp  he  ny  1  ace  t  amid  e  (DEPA)  (see  Table  1). 

Conformational  searches  were  carried  out  over  all  31 
structures  using  the  AMBER2  force  field  method  in 
molecular  mechanics  (MM)  optimization  encoded  in 
HyperChem  software17  in  our  attempts  to  obtain  the 
lowest  energy  conformer  within  a  reasonable  computa¬ 
tional  time.  Depending  on  the  number  of  free  torsion 
angles  in  each  molecule  numerous  conformers  (between 
100  and  200)  were  found  by  MM  optimizations.  These 
optimizations  were  concluded  when  a  gradient  of 
0.01  kcal/(Amol)  was  reached  for  a  certain  conformer. 
The  lowest  energy  conformer  for  a  given  molecule  was 
then  subjected  to  the  quantum-mechanical  semi-empiri¬ 
cal  AMI  calculations18  in  order  to  calculate  the  molecu¬ 
lar  characteristics.  These  optimized  structures  were 
loaded  in  Codessa  Pro  and  more  than  740  theoretical 
descriptors  were  calculated.  These  descriptors  can  be 
classified  into  several  groups:  (i)  constitutional,  (ii) 
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topological,  (iii)  geometrical,  (iv)  thermodynamic,  (v) 
quantum  chemical,  and  (vi)  charge-related.  The  stepwise 
regression  algorithm19  encoded  in  Codessa  Pro  software 
was  used  to  select  significant  descriptors  for  building 
multilinear  QSAR  models.  The  treatment  started  with 
the  reduction  in  the  number  of  molecular  descriptors. 
If  two  descriptors  intercorrelated  highly  with  each  other, 
then  only  one  of  them  was  selected;  descriptors  with 
insignificant  variance  for  the  data  set  treated  were  also 
rejected.  This  helps  to  speed  up  the  descriptor  selection 
and  reduce  the  probability  of  including  unrelated 
descriptors  by  chance.  The  ‘best  multilinear  regression’ 
(BMLR)  approach  encoded  in  Codessa  Pro  provides  a 
QSAR  equation  that  best  fits  the  experimental  data  in 
terms  of  the  Fisher  criterion  and  the  cross-validation 
coefficient  R)Y. 

A  major  decision  in  developing  successive  QSAR  is 
when  to  stop  adding  descriptors  to  the  model  during 
the  stepwise  regression  procedure.  A  simple  technique 
to  control  the  model  expansion  is  the  so-called  ‘breaking 
point’  in  the  improvement  of  the  statistical  quality  of  the 
model,  by  analyzing  the  plot  of  the  number  of  descrip¬ 
tors  involved  in  the  obtained  models  versus  squared  cor¬ 
relation  coefficient  values  corresponding  to  those 
models.  Frequently,  the  statistical  improvement  of  the 
regression  model  is  less  significant  (A R2  <  0.02-0.04) 
after  a  certain  number  of  independent  variables  in  the 
model  (‘breaking  point’).  Consequently,  the  model  cor¬ 
responding  to  the  breaking  point  is  considered  the 
best/optimum  model. 

Another  important  step  in  the  QSAR  modeling  is  to  val¬ 
idate  the  obtained  model.  Internal  validation  was  car¬ 
ried  out  for  the  best  model  obtained  by  Codessa  Pro 
as  follows:  (i)  the  parent  data  points  (31)  were  divided 
into  three  subsets  (A-C):  the  first,  fourth,  seventh,  etc., 
data  points  go  into  the  first  subset  (A),  the  second,  fifth, 
eight,  etc.,  into  the  second  subset  (B),  and  the  third, 
sixth,  ninth,  etc.,  into  the  third  subset  (C),  (ii)  the  three 
sets  A-C  were  prepared  as  the  combinations  of  two 
training  subsets  (A  and  B),  (A  and  C),  and  (B  and  C), 
respectively.  The  remaining  subsets  (A,  B,  and  C,  respec¬ 
tively)  become  the  corresponding  test  sets  then,  and  (iii) 
a  correlation  equation  was  derived  for  each  of  the  train¬ 
ing  sets  with  the  same  descriptors  (but  different  regres¬ 
sion  coefficients).  Next,  the  equation  obtained  was 
used  to  predict  the  protection  time  values  for  the  com¬ 
pounds  from  the  corresponding  test  set. 

Another  validation  that  was  used  in  this  study  is  leave- 
one-out  approach.20  This  validation  was  performed  for 


the  main  models.  Thus,  the  efficiency  of  the  QSAR  equa¬ 
tions  to  predict  protection  time  was  estimated  based  on 
the  comparison  of  criterion  such  as  the  cross-validation 
coefficient  R2V.  We  also  divided  the  parent  data  set  to 
provide  an  external  test  set  consisting  of  every  fifth  com¬ 
pound;  we  used  the  remaining  26  compounds  as  a  train¬ 
ing  set  to  obtain  a  4-descriptor  model  (R2  -  0.83),  where 
the  external  set  was  tested.  It  gave  a  satisfactory 
<ed  =  0.76. 

The  best  statistical  model  obtained  by  using  Codessa 
Pro  descriptors  for  the  PT  data  is  shown  in  Table  2. 
This  model  includes  4-descriptors  that  are  ordered  by 
descending  order  according  to  their  statistical  signifi¬ 
cance  (r  test).  In  Table  2,  X  and  AX  are  the 
regression  coefficients  and  their  standard  errors.  The 
co-linearity  of  any  pair  of  the  descriptors  is  less  than 


Therefore,  the  model  descriptors  can  be  considered  suf¬ 
ficiently  orthogonal.  The  number  of  parameters  was 
selected  according  to  the  breaking  point  rule  for  the 
improvement  of  R2  as  demonstrated  in  Figure  1 . 

There  are  several  treatments  of  Vp  in  the  literature  sug¬ 
gesting  that  the  vapor  pressure  can  be  correlated  well 
with  the  protection  time.3,6  Because  of  the  importance 
of  the  vapor  pressure  indicated  by  the  previous  workers, 
we  tested  whether  the  use  of  Vp  as  a  descriptor  would 
improve  the  correlation.  It  was  clearly  not  appropriate 
to  use  measured  Vp  since  an  equation  including  such  a 
descriptor  could  not  be  used  conveniently  for  predictive 


Figure  1.  Breaking  point  rule  for  determination  of  the  number  of  the 
descriptors. 


Table  2.  The  best  4-descriptor  QSAR  model  with  R2  =  0.78,  N  =  31,  F  =  23.9  and  s2  =  0.51 


Descriptor  no. 

X 

±AA' 

t  test 

R2 

*cv 

/ 

Descriptor11 

0 

21.1 

2.09 

10.1 

Intercept 

1 

-86.2 

10.5 

-8.19 

0.16 

0.08 

1.78 

Principal  moment  of  inertia  A,  Di 

2 

-0.93 

0.13 

-6.89 

0.54 

0.46 

1.00 

Structure  information  content  (0),  D2 

3 

-0.99 

0.23 

-4.28 

0.71 

0.63 

0.65 

Kier  and  Hall  index  (order  2),  D3 

4 

-2.66 

0.91 

-2.91 

0.79 

0.70 

0.51 

Tot  hybrid,  comp,  of  molec.  dipole,  D4 

a  Descriptor  definitions  are  given  in  Supplementary  material. 
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purposes.  The  dependence  of  Vp  on  structure  depends 
somewhat  on  the  type  of  compound  considered,  there¬ 
fore  a  new  QSPR  model  for  the  vapor  pressure  was  de¬ 
rived  specifically  for  the  data  set  of  31  DEET  analogues. 

Experimental  vapor  pressures  (  Vp)  of  these  compounds 
were  taken  from  the  SciFinder  catalog.21  Using  Codessa 
Pro  in  the  normal  manner  gave  the  3-descriptor  model 
for  the  vapor  pressure  shown  in  Table  3  with  the  follow¬ 
ing  statistical  characteristics:  R2  -  0.956,  R2CV  =  0.940, 
F  -  198.82,  and  s2  =  0.041. 

The  most  significant  descriptor  according  to  the  t  test  in 
Table  3  is  D5,  the  gravitation  index  calculated  over  all 
bonds  of  the  molecule.  As  can  be  noted,  with  descriptor 
D5  alone  the  model  R2  is  already  0.80.  In  addition,  the 
combination  of  the  descriptors  D5  and  D6  shows  that 
the  equation  is  similar  to  the  more  general  model  devel¬ 
oped  in  work  (22)  which  is  based  on  41 1  compounds.  All 
these  suggests  that  the  QSPR  equation  in  Table  3  is  reli¬ 
able  and  can  be  used  for  adequate  prediction  of  the  va¬ 
por  pressure. 


Next,  values  of  the  vapor  pressure  predicted  by  Table  3 
relationship  (see  Table  1)  were  used  as  an  external 
descriptor  in  the  common  descriptor  pool  to  try  to  im¬ 
prove  the  model  of  Table  2  for  the  protection  times. 
Two  functional  relations  were  constructed  from  the  pre¬ 
dicted  Vp,  that  is,  (i)  log  Vp  and  (ii)  (log  Vp)2.  After  load¬ 
ing  these  descriptors  in  the  whole  Codessa  Pro  storage, 
the  BMLR  algorithm  was  run  again  in  order  to  build 
the  models.  The  best  4-descriptor  model  found  among 
744  descriptors  is  shown  in  Table  4.  This  equation 
included  the  (log  Vpf  as  an  independent  variable.  More¬ 
over,  a  model  with  two  descriptors  (including  (log  Vp)2) 
already  gave  a  significantly  high  correlation  R2  =  0.70  as 
can  be  seen  from  Table  4. 

Tables  2  and  4  show  that  these  two  models  are  close 
from  a  statistical  point  of  view.  However,  the  equation 
in  Table  4  is  better  than  that  of  Table  2  in  terms  of 
R2.  .v2,  and  F.  Also,  the  values  of  the  statistical  parame¬ 
ters  show  that  the  models  are  robust  and  describe  well 
the  experimental  data.  The  Table  5  values  of  the  protec¬ 
tion  time  collects  predicted  from  each  of  the  models 


Table  3.  Three-parameter  model  for  the  vapor  pressure  (Kp)  based  on  31  compounds 


Descriptor  no.  X 

±\X 

t  test 

R 2 

*1 

s2 

Descriptor15 

0  10.49 

1.01 

10.41 

Intercept 

1  0.01 

3e-4 

-23.19 

0.80 

0.77 

0.18 

Gravitation  index  (all  bonds),  D5 

2  -26.51 

2.88 

-9.21 

0.88 

0.85 

0.11 

H-donors  FPSA  (version  2),  D^ 

3  0.43 

0.06 

7.01 

0.96 

0.94 

0.04 

Tot  molecular  2-center  resonance  energy/no.  of  atoms,  D 7 

b  Descriptor  definitions  are  given  in  Supplementary  material. 

Table  4.  The  best  4-parameter  model  with  calculated  descriptor  (log  Vp)2:  R2 

=  0.80,  F  = 

26,  and  s2  =  0.47 

Descriptor  no.  X 

±AJf  t  test 

R 2 

Ri 

s1 

Descriptor15 

0 

41.10 

11.74  3.49 

Intercept 

1 

-77.09 

8.61  -8.95 

0.165 

0.08 

1.77 

Principal  moment  of  inertia  A,  D, 

2 

-0.25 

0.03  -8.86 

0.70 

0.64 

0.65 

(log  Kp)2,  D8 

3 

0.41 

0.10  3.95 

0.77 

0.67 

0.51 

HA-dependent  HDSA-2(Zefirov),  D9 

4 

-44.62 

15.08  -2.95 

0.80 

0.72 

0.47 

Minimum  atomic  orbital  electronic  population,  Di0 

c  Descriptor  definitions  are  given 

in  Supplementary  material. 

Table  5. 

Predicted  protection  times  (PT)  in  hours 

ID 

Exp.  PT 

Pred.  PT-2 

Pred.  PT-4 

ID 

Exp.  PT 

Pred.  PT-2 

Pred.  PT-4 

1 

5 

4.49 

4.24 

17 

1.08 

0.71 

1.00 

2 

3 

2.13 

2.98 

18 

1 

1.88 

1.21 

3 

3 

3.37 

2.80 

19 

0.5 

1.73 

1.47 

4 

2.83 

3.61 

3.50 

20 

0.67 

0.95 

1.68 

5 

1.67 

2.69 

2.16 

21 

0.58 

0.74 

0.71 

6 

1 

0.92 

1.32 

22 

0.5 

0.84 

0.97 

7 

5 

3.66 

3.71 

23 

0.08 

0.12 

0.31 

8 

4 

3.84 

3.32 

24 

0.08 

0 

-0.30 

9 

4 

3.45 

3.81 

25 

0.08 

-0.59 

-1.28 

10 

3.5 

2.84 

3.24 

26 

3 

2.96 

2.19 

11 

2.83 

2.93 

3.16 

27 

2 

1.2 

1.28 

12 

1 

1.65 

2.24 

28 

1.42 

2.03 

1.39 

13 

3 

2.65 

2.93 

29 

1 

2.15 

1.63 

14 

2.67 

1.71 

2.15 

30 

1 

1.61 

1.57 

15 

2 

2.35 

2.52 

31 

0.75 

0.57 

0.92 

16 

1.17 

0.59 

0.57 

Predicted  PT-2  using  the  model  in  Table  2. 
Predicted  PT-4  using  the  model  in  Table  4. 
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given  in  Tables  2  and  4.  Graphical  presentations  of  these 
predictions  are  provided  in  Figures  2  and  3. 

It  can  be  noted  from  both  figures  and  Table  5  above  that 
the  PT  of  the  compounds  with  ID  24  and  25  was  predict- 


Figure  2.  Experimental  versus  predicted  PT  according  to  the  model  in 
Table  2. 


Figure  3.  Experimental  versus  predicted  PT  according  to  the  model  in 
Table  4. 


ed  as  a  negative  value.  However,  these  compounds  are 
not  outliers  according  to  the  model  errors  (standard 
deviation).  Since  the  BMLR  method  is  not  a  constrained 
algorithm  by  the  experimental  values,  this  is  possible. 

In  order  to  test  the  predictive  power  of  the  models  an 
internal  threefold  cross-validation  was  performed  for 
the  current  data  set.  The  results  of  this  testing  are  shown 
in  Table  6.  The  data  sets  AC  are  divided  as  was  ex¬ 
plained  previously.  The  superior  robustness  of  the  Table 
4  model  is  also  evident  from  Table  6. 

The  descriptors  involved  in  the  models  could  be  possibly 
explained  as  follows:  (i)  D!  and  D2,  that  are  molecular 
shape  related  descriptors,  represent  the  repellent  fit  into 
a  receptor  active  center,  (ii)  D5  describes  the  repellent 
chemical  reaction  with  a  receptor  active  center.  Basical¬ 
ly,  the  repellent  activity  quantified  by  the  protection 
time  can  be  assigned  to  the  influence  of  three  main 
molecular  interactions.  First,  vaporization  is  connected 
with  the  duration  of  time  when  a  mosquito  can  have 
contact  with  the  repellent.  As  shown  above  and  in  previ¬ 
ous  models  of  vapor  pressure,22  the  molecular  size  and 
shape  descriptors  such  as  D!  and  D2  play  a  determining 
role  for  vapor  pressure  of  compounds.  The  second 
important  characteristic  is  the  structural  tit  on  an  un¬ 
known  active  receptor  center.  The  third  kind  of  interac¬ 
tion  should  relate  to  the  chemical  reaction  with  a 
receptor,  resulting  in  the  act  of  repelling.  Again,  this 
should  be  directly  related  to  the  shape  and  size  descrip¬ 
tors  of  this  QSPR  model  (Di  and  D2).  The  interaction 
between  the  active  compound  and  its  biological  counter¬ 
part  can  be  also  reflected  by  D3,  that  is  connected  to  the 
shape  and  branching  of  the  compound,  and  to  descrip¬ 
tor  D4  that  characterized  the  charge  distribution  in  the 
compound.  The  dipole  moment  indicates  the  intrinsic 
polarity  of  the  molecule.  Its  magnitude  is  also  a  good 
indicator  of  lipophilicity  and  hydrophobicity;  the  larger 
its  magnitude,  the  higher  is  its  hydrophilicity.4 

Regarding  functional  groups  and  structural  correlations, 
all  compounds  in  the  data  set  include  an  O  atom.  The 
descriptors  D9  and  Di0  are  connected  to  the  hydrogen 
donor  capabilities  of  the  molecule  and  the  orbital  elec¬ 
tronic  population.  In  turn,  it  could  possibly  influence 
the  protection  time  of  the  repellent.3  The  most  active 
compounds  seem  to  be  compounds  with  the  aromatic 
ring  bearing  one  substituent  (CH3  or  Cl).  The  examina- 


Table  6.  Internal  validation  of  the  QSAR  models 


Training  set 

N 

R2  (fit) 

Ri  (fit) 

s2  (fit) 

Test  set 

N 

R2  (pred) 

s 1  (pred) 

Model  in  Table  2 

A  +  B 

20 

0.84 

0.72 

0.43 

C 

11 

0.71 

0.80 

A  +  C 

21 

0.83 

0.73 

0.51 

B 

10 

0.60 

0.97 

B  +  C 

21 

0.72 

0.67 

0.59 

A 

10 

0.82 

0.62 

Average 

0.80 

0.71 

0.51 

0.71 

0.79 

Model  in  Table  4 

A  +  B 

20 

0.78 

0.74 

0.58 

C 

11 

0.91 

0.51 

A  +  C 

21 

0.82 

0.72 

0.54 

B 

10 

0.87 

0.72 

B  +  C 

21 

0.82 

0.70 

0.39 

A 

10 

0.83 

1.02 

Average 

0.81 

0.72 

0.51 

0.87 

0.75 
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tion  of  the  respective  descriptor  values  for  such  com¬ 
pounds  showed  a  tendency  for  D8  and  D!  values  to  be 
low.  Change  of  the  aromatic  via  alicyclic  was  usually 
slightly  deleterious  to  PT.  These  structural  criteria  could 
be  used  for  guidance  for  the  synthesis  of  active  repellents. 

The  equations  possess  one  common  descriptor:  Princi¬ 
pal  moment  of  inertia  A.  It  is  likely  that  this  descriptor 
related  linearly  to  the  experimental  data  and  is  impor¬ 
tant  for  the  mass  distribution  of  the  molecule  (see  Sup¬ 
plementary  material  for  the  descriptor  definition). 
Finally,  the  Principal  moment  of  inertia  A  possesses 
the  largest  t  values  in  both  equations  (t  values  define 
the  statistical  significance  of  a  descriptor). 

The  model  in  Table  4  (in  contrast  to  the  model  in  Table 
2)  R 2  of  0.70  with  just  two  descriptors.  The  addition  of 
the  external  descriptor  (log  Vpf  drastically  improves  the 
quality  of  the  fit.  We  also  tested  as  an  additional  exter¬ 
nal  descriptor  the  lipophilicity  log  P  (octanol-water  par¬ 
tition  coefficient),  however,  its  inclusion  did  not  lead  to  a 
better  QSAR  model. 

Two  QSAR  models  were  developed  for  the  description 
of  mosquito  repellent  protection  times  PT  with  satisfac¬ 
tory  statistical  characteristics.  The  models  include  4-de¬ 
scriptors  revealing  the  linear  relationship  with  the 
protection  times  of  31  repellents.  External  descriptors 
such  as  log  Vp  and  its  square  function  were  added  to 
the  descriptor  pool  since  the  vapor  pressure  is  important 
factor  for  the  PT. 

An  additional  QSPR  model  was  developed  for  log  Vp 
and  thus  no  experimental  data  are  needed  for  predic¬ 
tions  of  the  PT  from  this  data  set.  The  examination  of 
this  descriptor  space  revealed  that  the  descriptor 
(log  Vp)2  is  statistically  significant  and  improves  the 
model  quality  significantly.  In  addition,  the  descriptors 
that  appeared  in  the  models  and  feature  the  shape  and 
volume  as  well  as  the  charge  distribution  of  compounds 
are  likely  important  for  determining  the  activity  of  the 
repellents.  The  PT  predicted  by  both  models  (see  Table 
5)  are  slightly  higher  for  compounds  8  and  9  that  possess 
experimental  PT  values  lower  than  DEET  (7).  However, 
the  main  prediction  trend  of  these  equations  follows  the 
experimental  data  within  the  error  limits. 

The  success  of  the  present  work  suggests  that  a  general 
QSPR  treatment  of  repellents  could  be  of  great  benefit 
in  synthetic  efforts  in  the  effort  to  discover  better  com¬ 
pounds  for  practical  use. 
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